LST followups: better work divisions, concrete kernel dimension, some cleanup and fixes #47084

ariostas · 2025-01-10T19:17:40Z

This PR addresses some of the LST followups that we have listed in #46746.

Here is the list of fixes/changes:

Better work division: we switched to using cms::alpakatools::makeworkdiv (instead of our custom createWorkDiv) and we now use cms::alpakatools::uniform_elements for kernel loops.
We switched to explicitly specifying kernel dimensions instead of using templated types.
Started removal of kVerticalModuleSlope (previously named lst_INF). We're doing this in two steps instead of one since the data files also need to be updated. We ensure a smooth transition by first supporting both options and later removing the legacy one.
We fixed some issues with our includes and with an overflow that was sometimes happening.

c.c. @slava77 @VourMa

cmsbuild · 2025-01-10T19:18:07Z

cms-bot internal usage

cmsbuild · 2025-01-10T19:20:08Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47084/43263

cmsbuild · 2025-01-10T19:20:34Z

A new Pull Request was created by @ariostas for master.

It involves the following packages:

RecoTracker/LSTCore (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

ariostas · 2025-01-10T19:23:06Z

Tagging @fwyzard since most (if not all) of the comments addressed were his

slava77 · 2025-01-10T19:28:57Z

test parameters:

enable_tests = gpu
workflows_gpu = 29634.704,29834.704
workflows = 29634.703,29834.703
relvals_opt = -w upgrade,standard
relvals_opt_gpu = -w upgrade,standard

slava77 · 2025-01-10T19:30:07Z

@cmsbuild please test

cmsbuild · 2025-01-10T21:57:04Z

-1

Failed Tests: UnitTests RelVals-GPU
Size: This PR adds an extra 104KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1eb2fd/43723/summary.html
COMMIT: 1a27b2a
CMSSW: CMSSW_15_0_X_2025-01-10-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47084/43723/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 1 errors in the following unit tests:

---> test test-das-selected-lumis had ERRORS

RelVals-GPU

29834.70429834.704_TTbar_14TeV+Run4D110PU_lstOnGPUIters01TrackingOnly/step3_TTbar_14TeV+Run4D110PU_lstOnGPUIters01TrackingOnly.log

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 8 differences found in the comparisons
DQMHistoTests: Total files compared: 52
DQMHistoTests: Total histograms compared: 3996179
DQMHistoTests: Total failures: 64
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3996095
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 51 files compared)
Checked 226 log files, 195 edm output root files, 52 DQM output files
TriggerResults: no differences found

slava77 · 2025-01-10T22:12:55Z

29834.70429834.704_TTbar_14TeV+Run4D110PU_lstOnGPUIters01TrackingOnly/step3_TTbar_14TeV+Run4D110PU_lstOnGPUIters01TrackingOnly.log

there are a bunch of errors like

alpaka/event/EventUniformCudaHipRt.hpp(66) 
'TApi::eventDestroy(m_UniformCudaHipEvent)' returned error  : 
'cudaErrorIllegalAddress': 'an illegal memory access was encountered'!`

the same workflow step3 in the baseline ran OK. So, the crash seems related to this PR.

jfernan2 · 2025-01-13T16:11:38Z

assign heterogeneous

cmsbuild · 2025-01-13T16:12:04Z

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

RecoTracker/LSTCore/interface/Circle.h

RecoTracker/LSTCore/interface/alpaka/Common.h

RecoTracker/LSTCore/src/alpaka/PixelTriplet.h

fwyzard · 2025-01-27T13:45:41Z

I can have a look in the coming days, but if this is urgent for any reason go ahead and sign it, and I will still have a look after the fact.

fwyzard · 2025-01-27T13:46:43Z

hold

@cms-sw/heterogeneous-l2
(6 days after the last update resolving available comments)
please clarify on the status of your review or the expected signoff time.
Thank you.

Actually, you know what ?
I will review it when I have the time.

cmsbuild · 2025-01-27T13:47:04Z

Pull request has been put on hold by @fwyzard
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

fwyzard · 2025-01-27T16:47:47Z

RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc

-  Vec3D const blocksPerGrid_crossCleanpT3{1, 4, 20};
-  WorkDiv3D const crossCleanpT3_workDiv =
-      createWorkDiv(blocksPerGrid_crossCleanpT3, threadsPerBlock_crossCleanpT3, elementsPerThread);
+  auto const crossCleanpT3_workDiv = cms::alpakatools::make_workdiv<Acc2D>({20, 4}, {64, 16});


Here the X and Y values in the ranges are inverted with respect to before - is it intended ?

Yeah, there were a few places where I flipped the order so that the loops are nested in the recommended order.

fwyzard · 2025-01-27T16:51:36Z

RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc

-  Vec3D const blocksPerGrid_crossCleanpLS{1, 4, 20};
-  WorkDiv3D const crossCleanpLS_workDiv =
-      createWorkDiv(blocksPerGrid_crossCleanpLS, threadsPerBlock_crossCleanpLS, elementsPerThread);
+  auto const crossCleanpLS_workDiv = cms::alpakatools::make_workdiv<Acc2D>({20, 4}, {32, 16});


Also here (OK, so it's probably intended).

Same as above

fwyzard · 2025-01-27T16:54:03Z

RecoTracker/LSTCore/src/alpaka/MiniDoublet.h

-    if (slope ==
-        kVerticalModuleSlope)  // Designated for tilted module when the slope is infinity (module lying along y-axis)
+    if (slope == kVerticalModuleSlope ||
+        edm::isNotFinite(slope))  // Designated for tilted module when the slope is infinity (module lying along y-axis)


@makortel do you know if edm::isFinite/edm::isNotFinite is guaranteed to in device code ?

Theoretically they could presently work: the functions are constexpr and presently use union for type punning (that is strictly speaking undefined behavior). I'd like to replace the union with std::bit_cast that is constexpr, but on the other hand e.g. https://stackoverflow.com/a/78232359 kind of suggests to use cuda::std::bit_cast on CUDA 12.8.

I see in GCC 12 the std::bit_cast implementation is just a call to __builtin_bit_cast, and that e.g. in https://github.com/cms-sw/cmssw/blob/master/HeterogeneousCore/AlpakaInterface/interface/atomicMaxF.h we use edm::bit_cast (that just forwards to std::bit_cast or __builtin_bit_cast) only for CPU implementation (I don't remember the exact reason for that though, whether the edm::bit_cast didn't work on device code, or the intrinsics were "easier" on CUDA+HIP).

So perhaps for long term it would be better to define Alpaka-specific functions (ideally Alpaka could provide a portable bit_cast).

It currently does work, but I agree that would be better to be more careful about it (in a separate PR?).

On a related note, should I also reimplement std::distance to be safe?

(in a separate PR?).

To me a separate PR would be fine.

For pointers and random access iterators you can just use b - a instead of std::distance(a, b).

Anyway, from looking at the code a while back, I think std::distance(a, b) should be safe.

(I don't remember the exact reason for that though, whether the edm::bit_cast didn't work on device code, or the intrinsics were "easier" on CUDA+HIP).

I'm not sure either.

I've tried writing a simple kernel, and __builtin_bit_cast(float, i) and __int_as_float(i) compile to the same exact PTX code (basically a no-op).

fwyzard · 2025-01-27T16:57:53Z

unhold

slava77 · 2025-01-28T20:05:00Z

unhold

@fwyzard
please clarify if this can be signed now (by you or @makortel ) or if some updates are still needed in the code.
Thank you.

fwyzard · 2025-01-28T20:47:26Z

@slava77 it's not clear to me what is the situation of edm::isFinite / edm::isNotFinite in device code.
If you or @makortel are convinced that it is fine to use it, then we can sign the PR.

slava77 · 2025-01-28T23:13:19Z

@slava77 it's not clear to me what is the situation of edm::isFinite / edm::isNotFinite in device code. If you or @makortel are convinced that it is fine to use it, then we can sign the PR.

I see that we have it in Patatrack/CA code for half a year

cmssw/RecoTracker/PixelSeeding/plugins/alpaka/CAHitNtupletGeneratorKernelsImpl.h

Lines 523 to 527 in dd230c0

    
           // if the fit has any invalid parameters, mark it as bad 
        
           bool isNaN = false; 
        
           for (int i = 0; i < 5; ++i) { 
        
             isNaN |= edm::isNotFinite(tracks_view[it].state()(i)); 
        
           }

#45542 (14_1_X)
is it reasonable to rely on this case to let it go in this PR as well?

makortel · 2025-01-28T23:30:39Z

Could alpaka::math::isfinite() be used here instead? Or is any input expected from fast-math-compiled code?

slava77 · 2025-01-28T23:53:38Z

Could alpaka::math::isfinite() be used here instead? Or is any input expected from fast-math-compiled code?

we used alpaka::math::isnan before #46857 and that was buggy because of fastmath
Note that this PR is not the first time edm::is*Finite is used in the LST code

fwyzard · 2025-01-29T05:41:17Z

OK, thanks, then I don't see reason to request further changes here.

I'll see if we can fix alpaka::math::isfinite() for fast math compilation and add a unit test for it.

fwyzard · 2025-01-29T05:41:22Z

+heterogeneous

cmsbuild · 2025-01-29T05:41:45Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @mandrenguyen, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

mandrenguyen · 2025-01-29T07:48:14Z

+1

cmsbuild added this to the CMSSW_15_0_X milestone Jan 10, 2025

cmsbuild added reconstruction-pending pending-signatures tests-pending orp-pending code-checks-pending tracking labels Jan 10, 2025

cmsbuild added code-checks-approved and removed code-checks-pending labels Jan 10, 2025

cmsbuild added tests-started and removed tests-pending labels Jan 10, 2025

cmsbuild added tests-rejected and removed tests-started labels Jan 10, 2025

cmsbuild added the heterogeneous-pending label Jan 13, 2025

slava77 mentioned this pull request Jan 13, 2025

Integration PR followups: make_workdiv, uniform_elements, concrete kernel dimensions SegmentLinking/cmssw#141

Merged

makortel reviewed Jan 14, 2025

View reviewed changes

RecoTracker/LSTCore/interface/Circle.h Outdated Show resolved Hide resolved

RecoTracker/LSTCore/interface/alpaka/Common.h Outdated Show resolved Hide resolved

RecoTracker/LSTCore/src/alpaka/PixelTriplet.h Outdated Show resolved Hide resolved

cmsbuild mentioned this pull request Jan 15, 2025

Replace ALPAKA_STATIC_ACC_MEM_GLOBAL with HOST_DEVICE_CONSTANT #47108

Merged

ariostas added 3 commits January 17, 2025 06:28

Use make_workdiv and uniform_elements

7e5703d

Use int16_t for hitRanges counters

f1a4cc6

Started removal of kVerticalModuleSlope

8e46424

slava77 mentioned this pull request Jan 24, 2025

[DBG_X] RelVals 29634.703, 29834.703 failing with InvalidReference #47176

Open

cmsbuild added the hold label Jan 27, 2025

fwyzard reviewed Jan 27, 2025

View reviewed changes

cmsbuild removed the hold label Jan 27, 2025

cmsbuild added fully-signed heterogeneous-approved and removed pending-signatures heterogeneous-pending labels Jan 29, 2025

cmsbuild added orp-approved and removed orp-pending labels Jan 29, 2025

cmsbuild merged commit 878e0b4 into cms-sw:master Jan 29, 2025
15 checks passed

This was referenced Jan 29, 2025

Remove unused variable in TrackingManagerHelper.icc #47200

Merged

[ROOTMaster] Updated root to tip of branch master cms-sw/cmsdist#9643

Open

Remove unused variable in TestAssociator.cc #47199

Merged

VourMa mentioned this pull request Jan 30, 2025

Follow-up tasks for the LST algorithm #46746

Open

LST followups: better work divisions, concrete kernel dimension, some cleanup and fixes #47084

LST followups: better work divisions, concrete kernel dimension, some cleanup and fixes #47084

Conversation

ariostas commented Jan 10, 2025

cmsbuild commented Jan 10, 2025 • edited Loading

cmsbuild commented Jan 10, 2025

cmsbuild commented Jan 10, 2025

ariostas commented Jan 10, 2025

slava77 commented Jan 10, 2025

slava77 commented Jan 10, 2025

cmsbuild commented Jan 10, 2025

Unit Tests

RelVals-GPU

Comparison Summary

slava77 commented Jan 10, 2025 • edited Loading

jfernan2 commented Jan 13, 2025

cmsbuild commented Jan 13, 2025

fwyzard commented Jan 27, 2025

fwyzard commented Jan 27, 2025 • edited Loading

cmsbuild commented Jan 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Jan 27, 2025

slava77 commented Jan 28, 2025

fwyzard commented Jan 28, 2025

slava77 commented Jan 28, 2025

makortel commented Jan 28, 2025

slava77 commented Jan 28, 2025

fwyzard commented Jan 29, 2025

fwyzard commented Jan 29, 2025

cmsbuild commented Jan 29, 2025

mandrenguyen commented Jan 29, 2025

cmsbuild commented Jan 10, 2025 •

edited

Loading

slava77 commented Jan 10, 2025 •

edited

Loading

fwyzard commented Jan 27, 2025 •

edited

Loading